import os
os.environ['CUDA_VISIBLE_DEVICES'] = '2'Fine-tuning Florence-2 on Object Detection Dataset
Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. The model demonstrates strong zero-shot and fine-tuning capabilities across tasks such as captioning, object detection, grounding, and segmentation.

Figure 1. Illustration showing the level of spatial hierarchy and semantic granularity expressed by each task. Source: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.
The model takes images and task prompts as input, generating the desired results in text format. It uses a DaViT vision encoder to convert images into visual token embeddings. These are then concatenated with BERT-generated text embeddings and processed by a transformer-based multi-modal encoder-decoder to generate the response.

Figure 2. Overview of Florence-2 architecture. Source: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.
Setup
Configure your API keys
To fine-tune Florence-2, you need to provide your HuggingFace Token and Roboflow API key. Follow these steps:
- Open your
HuggingFace Settingspage. ClickAccess TokensthenNew Tokento generate new token. - Go to your
Roboflow Settingspage. ClickCopy. This will place your private key in the clipboard. - In Colab, go to the left pane and click on
Secrets(🔑).- Store HuggingFace Access Token under the name
HF_TOKEN. - Store Roboflow API Key under the name
ROBOFLOW_API_KEY.
- Store HuggingFace Access Token under the name
Select the runtime
Let’s make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator, set it to L4 GPU, and then click Save.
!nvidia-smiSun Feb 16 17:52:44 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 54C P0 101W / 500W | 19377MiB / 81920MiB | 99% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:41:00.0 Off | 0 |
| N/A 42C P0 78W / 500W | 12139MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 58C P0 325W / 500W | 19831MiB / 81920MiB | 94% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:C1:00.0 Off | 0 |
| N/A 33C P0 62W / 500W | 5MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2629326 C ...naconda3/envs/zeel_py310/bin/python 19366MiB |
| 1 N/A N/A 2616743 C ...onda3/envs/shataxi_space/bin/python 12128MiB |
| 2 N/A N/A 2627487 C ...naconda3/envs/zeel_py310/bin/python 19820MiB |
+-----------------------------------------------------------------------------------------+
Download example data
NOTE: Feel free to replace our example image with your own photo.
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
!ls -lhtotal 12M
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.1
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.2
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.3
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.4
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.5
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.6
-rw-rw-r-- 1 patel_zeel patel_zeel 104K Jun 2 2023 dog.jpeg.7
-rw-rw-r-- 1 patel_zeel patel_zeel 3.0M Feb 16 17:42 'how-to-finetune-florence-2-on-detection-dataset copy 2.ipynb'
-rw-rw-r-- 1 patel_zeel patel_zeel 2.6M Feb 16 17:52 'how-to-finetune-florence-2-on-detection-dataset copy.ipynb'
-rw-rw-r-- 1 patel_zeel patel_zeel 2.6M Feb 16 17:48 how-to-finetune-florence-2-on-detection-dataset.ipynb
drwxrwxr-x 6 patel_zeel patel_zeel 4.0K Feb 16 17:52 model_checkpoints
drwxrwxr-x 5 patel_zeel patel_zeel 4.0K Feb 16 17:29 poker-cards-4
-rw-rw-r-- 1 patel_zeel patel_zeel 2.5M Jan 22 10:51 scratchpad.ipynb
EXAMPLE_IMAGE_PATH = "dog.jpeg"Download and configure the model
Let’s download the model checkpoint and configure it so that you can fine-tune it later on.
# !pip install -q transformers flash_attn timm einops peft
# !pip install -q roboflow git+https://github.com/roboflow/supervision.git# @title Imports
import io
import os
import re
import json
import torch
import html
import base64
import itertools
import numpy as np
import supervision as sv
# from google.colab import userdata
from IPython.core.display import display, HTML
from torch.utils.data import Dataset, DataLoader
from transformers import (
AdamW,
AutoModelForCausalLM,
AutoProcessor,
get_scheduler
)
from tqdm import tqdm
from typing import List, Dict, Any, Tuple, Generator
from peft import LoraConfig, get_peft_model
from PIL import Image
from roboflow import RoboflowDeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython.display
Load the model using AutoModelForCausalLM and the processor using AutoProcessor classes from the transformers library. Note that you need to pass trust_remote_code as True since this model is not a standard transformers model.
CHECKPOINT = "microsoft/Florence-2-base-ft"
# REVISION = 'refs/pr/6'
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(CHECKPOINT, trust_remote_code=True).to(DEVICE)
processor = AutoProcessor.from_pretrained(CHECKPOINT, trust_remote_code=True)Importing from timm.models.layers is deprecated, please import via timm.layers
Florence2LanguageForConditionalGeneration has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
Run inference with pre-trained Florence-2 model
# @title Example object detection inference
image = Image.open(EXAMPLE_IMAGE_PATH)
task = "<OD>"
text = "<OD>"
inputs = processor(text=text, images=image, return_tensors="pt").to(DEVICE)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
response = processor.post_process_generation(generated_text, task=task, image_size=(image.width, image.height))
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
bounding_box_annotator = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)
image = bounding_box_annotator.annotate(image, detections)
image = label_annotator.annotate(image, detections)
image.thumbnail((600, 600))
imageBoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.

# @title Example image captioning inference
image = Image.open(EXAMPLE_IMAGE_PATH)
task = "<DETAILED_CAPTION>"
text = "<DETAILED_CAPTION>"
inputs = processor(text=text, images=image, return_tensors="pt").to(DEVICE)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
response = processor.post_process_generation(generated_text, task=task, image_size=(image.width, image.height))
response{'<DETAILED_CAPTION>': 'In this image we can see a person wearing a bag and holding a dog. In the background there are buildings, poles and sky with clouds.'}
# @title Example caption to phrase grounding inference
image = Image.open(EXAMPLE_IMAGE_PATH)
task = "<CAPTION_TO_PHRASE_GROUNDING>"
text = "<CAPTION_TO_PHRASE_GROUNDING> Vehicle"
inputs = processor(text=text, images=image, return_tensors="pt").to(DEVICE)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
response = processor.post_process_generation(generated_text, task=task, image_size=(image.width, image.height))
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
bounding_box_annotator = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)
image = bounding_box_annotator.annotate(image, detections)
image = label_annotator.annotate(image, detections)
image.thumbnail((600, 600))
imageBoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.

Fine-tune Florence-2 on custom dataset
Download dataset from Roboflow Universe
ROBOFLOW_API_KEY = os.getenv("ROBOFLOW_API_KEY")
rf = Roboflow(api_key=ROBOFLOW_API_KEY)
project = rf.workspace("roboflow-jvuqo").project("poker-cards-fmjio")
version = project.version(4)
dataset = version.download("florence2-od")loading Roboflow workspace...
loading Roboflow project...
!head -n 5 {dataset.location}/train/annotations.jsonl{"image":"IMG_20220316_172418_jpg.rf.e3cb4a86dc0247e71e3697aa3e9db923.jpg","prefix":"<OD>","suffix":"9 of clubs<loc_138><loc_100><loc_470><loc_448>10 of clubs<loc_388><loc_145><loc_670><loc_453>jack of clubs<loc_566><loc_166><loc_823><loc_432>queen of clubs<loc_365><loc_465><loc_765><loc_999>king of clubs<loc_601><loc_440><loc_949><loc_873>"}
{"image":"IMG_20220316_171515_jpg.rf.e3b1932bb375b3b3912027647586daa8.jpg","prefix":"<OD>","suffix":"5 of clubs<loc_554><loc_2><loc_763><loc_467>6 of clubs<loc_399><loc_79><loc_555><loc_466>7 of clubs<loc_363><loc_484><loc_552><loc_905>8 of clubs<loc_535><loc_449><loc_757><loc_971>"}
{"image":"IMG_20220316_165139_jpg.rf.e30257ec169a2bfdfecb693211d37250.jpg","prefix":"<OD>","suffix":"9 of diamonds<loc_596><loc_535><loc_859><loc_982>jack of diamonds<loc_211><loc_546><loc_411><loc_880>queen of diamonds<loc_430><loc_34><loc_692><loc_518>king of diamonds<loc_223><loc_96><loc_451><loc_523>10 of diamonds<loc_387><loc_542><loc_604><loc_925>"}
{"image":"IMG_20220316_143407_jpg.rf.e1eb3be3efc6c3bbede436cfb5489e7c.jpg","prefix":"<OD>","suffix":"ace of hearts<loc_345><loc_315><loc_582><loc_721>2 of hearts<loc_709><loc_115><loc_888><loc_509>3 of hearts<loc_529><loc_228><loc_735><loc_613>4 of hearts<loc_98><loc_421><loc_415><loc_845>"}
{"image":"IMG_20220316_165139_jpg.rf.e4c229a9128494d17992cbe88af575df.jpg","prefix":"<OD>","suffix":"9 of diamonds<loc_141><loc_18><loc_404><loc_465>jack of diamonds<loc_589><loc_120><loc_789><loc_454>queen of diamonds<loc_308><loc_482><loc_570><loc_966>king of diamonds<loc_549><loc_477><loc_777><loc_904>10 of diamonds<loc_396><loc_75><loc_613><loc_458>"}
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
# # read jsonl file
# def read_jsonl(file_path: str) -> Generator[Dict[str, Any], None, None]:
# with open(file_path, "r") as f:
# for line in f:
# yield json.loads(line)
# lines = []
# split = "test"
# for line in read_jsonl(dataset.location + f"/{split}/annotations.jsonl"):
# # print(line)
# # edit = True
# # copied_line = list(line['suffix'])
# # for i in range(len(copied_line)):
# # if copied_line[i] == "<":
# # edit = False
# # elif copied_line[i] == ">":
# # edit = True
# # else:
# # if edit:
# # copied_line[i] = chr(ord(copied_line[i]) + 1)
# # copied_line = "".join(copied_line)
# # line['suffix'] = copied_line
# line['suffix'] = line['suffix'].replace("club", "dog").replace("diamond", "cat").replace("heart", "bird").replace("spade", "fish")
# print(line)
# lines.append(line)
# with open(dataset.location + f"/{split}/annotations.jsonl", "w") as f:
# for line in lines:
# f.write(json.dumps(line) + "\n")# @title Define `DetectionsDataset` class
class JSONLDataset:
def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.jsonl_file_path = jsonl_file_path
self.image_directory_path = image_directory_path
self.entries = self._load_entries()
def _load_entries(self) -> List[Dict[str, Any]]:
entries = []
with open(self.jsonl_file_path, 'r') as file:
for line in file:
data = json.loads(line)
entries.append(data)
return entries
def __len__(self) -> int:
return len(self.entries)
def __getitem__(self, idx: int) -> Tuple[Image.Image, Dict[str, Any]]:
if idx < 0 or idx >= len(self.entries):
raise IndexError("Index out of range")
entry = self.entries[idx]
image_path = os.path.join(self.image_directory_path, entry['image'])
try:
image = Image.open(image_path)
return (image, entry)
except FileNotFoundError:
raise FileNotFoundError(f"Image file {image_path} not found.")
class DetectionDataset(Dataset):
def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.dataset = JSONLDataset(jsonl_file_path, image_directory_path)
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
image, data = self.dataset[idx]
prefix = data['prefix']
suffix = data['suffix']
return prefix, suffix, image# @title Initiate `DetectionsDataset` and `DataLoader` for train and validation subsets
BATCH_SIZE = 6
NUM_WORKERS = 0
def collate_fn(batch):
questions, answers, images = zip(*batch)
inputs = processor(text=list(questions), images=list(images), return_tensors="pt", padding=True).to(DEVICE)
return inputs, answers
train_dataset = DetectionDataset(
jsonl_file_path = f"{dataset.location}/train/annotations.jsonl",
image_directory_path = f"{dataset.location}/train/"
)
val_dataset = DetectionDataset(
jsonl_file_path = f"{dataset.location}/valid/annotations.jsonl",
image_directory_path = f"{dataset.location}/valid/"
)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, collate_fn=collate_fn, num_workers=NUM_WORKERS, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, collate_fn=collate_fn, num_workers=NUM_WORKERS)# @title Setup LoRA Florence-2 model
# config = LoraConfig(
# r=8,
# lora_alpha=8,
# target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
# task_type="CAUSAL_LM",
# lora_dropout=0.05,
# bias="none",
# inference_mode=False,
# use_rslora=True,
# init_lora_weights="gaussian",
# )
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
task_type="CAUSAL_LM",
lora_dropout=0.05,
bias="none",
# inference_mode=False,
# use_rslora=True,
)
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()trainable params: 1,929,928 || all params: 272,733,896 || trainable%: 0.7076
torch.cuda.empty_cache()# @title Run inference with pre-trained Florence-2 model on validation dataset
def render_inline(image: Image.Image, resize=(128, 128)):
"""Convert image into inline html."""
image.resize(resize)
with io.BytesIO() as buffer:
image.save(buffer, format='jpeg')
image_b64 = str(base64.b64encode(buffer.getvalue()), "utf-8")
return f"data:image/jpeg;base64,{image_b64}"
def render_example(image: Image.Image, response):
try:
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
image = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX).annotate(image.copy(), detections)
image = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX).annotate(image, detections)
except:
print('failed to redner model response')
return f"""
<div style="display: inline-flex; align-items: center; justify-content: center;">
<img style="width:256px; height:256px;" src="{render_inline(image, resize=(128, 128))}" />
<p style="width:512px; margin:10px; font-size:small;">{html.escape(json.dumps(response))}</p>
</div>
"""
def render_inference_results(model, dataset: DetectionDataset, count: int):
html_out = ""
count = min(count, len(dataset))
for i in range(count):
image, data = dataset.dataset[i]
prefix = data['prefix']
suffix = data['suffix']
inputs = processor(text=prefix, images=image, return_tensors="pt").to(DEVICE)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
answer = processor.post_process_generation(generated_text, task='<OD>', image_size=image.size)
html_out += render_example(image, answer)
display(HTML(html_out))
render_inference_results(peft_model, val_dataset, 4)BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["table"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["chair"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["furniture"]}}
Fine-tune Florence-2 on custom object detection dataset
# @title Define train loop
def train_model(train_loader, val_loader, model, processor, epochs=10, lr=1e-6):
optimizer = AdamW(model.parameters(), lr=lr)
num_training_steps = epochs * len(train_loader)
lr_scheduler = get_scheduler(
name="linear",
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=num_training_steps,
)
render_inference_results(peft_model, val_loader.dataset, 6)
for epoch in range(epochs):
model.train()
train_loss = 0
for inputs, answers in tqdm(train_loader, desc=f"Training Epoch {epoch + 1}/{epochs}"):
input_ids = inputs["input_ids"]
pixel_values = inputs["pixel_values"]
labels = processor.tokenizer(
text=answers,
return_tensors="pt",
padding=True,
return_token_type_ids=False
).input_ids.to(DEVICE)
outputs = model(input_ids=input_ids, pixel_values=pixel_values, labels=labels)
loss = outputs.loss
loss.backward(), optimizer.step(), lr_scheduler.step(), optimizer.zero_grad()
train_loss += loss.item()
avg_train_loss = train_loss / len(train_loader)
print(f"Average Training Loss: {avg_train_loss}")
model.eval()
val_loss = 0
with torch.no_grad():
for inputs, answers in tqdm(val_loader, desc=f"Validation Epoch {epoch + 1}/{epochs}"):
input_ids = inputs["input_ids"]
pixel_values = inputs["pixel_values"]
labels = processor.tokenizer(
text=answers,
return_tensors="pt",
padding=True,
return_token_type_ids=False
).input_ids.to(DEVICE)
outputs = model(input_ids=input_ids, pixel_values=pixel_values, labels=labels)
loss = outputs.loss
val_loss += loss.item()
avg_val_loss = val_loss / len(val_loader)
print(f"Average Validation Loss: {avg_val_loss}")
render_inference_results(peft_model, val_loader.dataset, 6)
output_dir = f"./model_checkpoints/epoch_{epoch+1}"
os.makedirs(output_dir, exist_ok=True)
model.save_pretrained(output_dir)
processor.save_pretrained(output_dir)%%time
EPOCHS = 10
LR = 5e-6
train_model(train_loader, val_loader, peft_model, processor, epochs=EPOCHS, lr=LR)This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["table"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["chair"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["furniture"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["tablecloth"]}}
Training Epoch 1/10: 100%|██████████| 136/136 [02:53<00:00, 1.28s/it]
Average Training Loss: 6.217629111864987
Validation Epoch 1/10: 100%|██████████| 8/8 [00:04<00:00, 1.80it/s]
Average Validation Loss: 5.286705791950226
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["table"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["chair"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["furniture"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["tablecloth"]}}
Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.
Training Epoch 2/10: 100%|██████████| 136/136 [02:31<00:00, 1.11s/it]
Average Training Loss: 5.043076939442578
Validation Epoch 2/10: 100%|██████████| 8/8 [00:04<00:00, 1.75it/s]
Average Validation Loss: 4.181661516427994
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[198.0800018310547, 175.0399932861328, 487.3599853515625, 496.3199768066406], [0.3199999928474426, 129.59999084472656, 267.1999816894531, 410.55999755859375]], "labels": ["playing card", "playing card"]}}
{"<OD>": {"bboxes": [[321.6000061035156, 212.1599884033203, 344.0, 243.51998901367188]], "labels": ["human face"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["furniture"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["bed"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 0.3199999928474426, 639.0399780273438, 639.0399780273438]], "labels": ["dining table"]}}
Training Epoch 3/10: 100%|██████████| 136/136 [02:32<00:00, 1.12s/it]
Average Training Loss: 4.1506180728183075
Validation Epoch 3/10: 100%|██████████| 8/8 [00:04<00:00, 1.77it/s]
Average Validation Loss: 3.521006762981415
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.95999908447266, 512.3200073242188, 357.44000244140625]], "labels": ["queen of spades"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 267.1999816894531, 411.8399963378906], [197.44000244140625, 173.75999450683594, 488.0, 496.9599914550781], [198.72000122070312, 82.23999786376953, 380.47998046875, 323.5199890136719], [333.1199951171875, 43.20000076293945, 516.1599731445312, 207.0399932861328]], "labels": ["6 of hearts", "6 of diamonds", "7 of diamonds", "5 of diamonds"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 274.239990234375, 395.8399963378906, 511.67999267578125]], "labels": ["queen of spades"]}}
{"<OD>": {"bboxes": [[56.0, 229.44000244140625, 331.1999816894531, 639.0399780273438], [296.6399841308594, 252.47999572753906, 459.8399963378906, 550.719970703125], [436.79998779296875, 157.1199951171875, 557.1199951171875, 392.0]], "labels": ["queen of spades", "6 of spade", "6 of clubs"]}}
{"<OD>": {"bboxes": [[15.039999961853027, 255.0399932861328, 213.44000244140625, 464.9599914550781], [208.95999145507812, 285.7599792480469, 345.2799987792969, 461.7599792480469]], "labels": ["queen of spades", "queen card"]}}
{"<OD>": {"bboxes": [[294.0799865722656, 176.3199920654297, 624.9599609375, 399.03997802734375], [11.199999809265137, 228.1599884033203, 275.5199890136719, 427.8399963378906]], "labels": ["6 of spades", "7 of spade"]}}
Training Epoch 4/10: 100%|██████████| 136/136 [02:31<00:00, 1.12s/it]
Average Training Loss: 3.746520130073323
Validation Epoch 4/10: 100%|██████████| 8/8 [00:04<00:00, 1.81it/s]
Average Validation Loss: 3.1994041204452515
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.31999969482422, 512.3200073242188, 357.44000244140625], [164.16000366210938, 330.55999755859375, 301.1199951171875, 585.2799682617188]], "labels": ["queen of spades", "king of spade"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 268.47998046875, 412.47998046875], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [333.1199951171875, 43.20000076293945, 516.7999877929688, 207.67999267578125]], "labels": ["6 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 273.6000061035156, 395.8399963378906, 511.67999267578125], [368.3199768066406, 235.1999969482422, 517.4400024414062, 491.1999816894531]], "labels": ["queen of spades", "queen spades"]}}
{"<OD>": {"bboxes": [[56.0, 228.79998779296875, 331.8399963378906, 639.0399780273438], [437.44000244140625, 156.47999572753906, 557.1199951171875, 392.0]], "labels": ["8 of spades", "6 of spade"]}}
{"<OD>": {"bboxes": [[13.119999885559082, 254.39999389648438, 213.44000244140625, 466.239990234375], [208.95999145507812, 285.1199951171875, 345.2799987792969, 463.67999267578125]], "labels": ["queen of spades", "queen card"]}}
{"<OD>": {"bboxes": [[293.44000244140625, 176.3199920654297, 624.9599609375, 399.03997802734375], [11.199999809265137, 228.1599884033203, 275.5199890136719, 428.47998046875], [96.95999908447266, 432.9599914550781, 314.55999755859375, 566.0800170898438], [309.44000244140625, 423.3599853515625, 548.7999877929688, 563.5199584960938]], "labels": ["9 of spades", "7 of clubs", "9 of clubs", "9 of hearts"]}}
Training Epoch 5/10: 100%|██████████| 136/136 [02:33<00:00, 1.13s/it]
Average Training Loss: 3.459920299403808
Validation Epoch 5/10: 100%|██████████| 8/8 [00:04<00:00, 1.81it/s]
Average Validation Loss: 2.9735917448997498
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.95999908447266, 512.3200073242188, 357.44000244140625], [164.16000366210938, 330.55999755859375, 301.1199951171875, 585.2799682617188], [173.1199951171875, 14.399999618530273, 303.03997802734375, 252.47999572753906], [53.439998626708984, 239.67999267578125, 165.44000244140625, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "queen spades", "9 of spoons"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 268.47998046875, 412.47998046875], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [333.1199951171875, 43.20000076293945, 516.7999877929688, 207.67999267578125]], "labels": ["6 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 273.6000061035156, 396.47998046875, 511.67999267578125], [368.3199768066406, 235.1999969482422, 517.4400024414062, 491.1999816894531]], "labels": ["queen of spades", "9 of clubs"]}}
{"<OD>": {"bboxes": [[55.36000061035156, 228.79998779296875, 333.1199951171875, 639.0399780273438], [436.79998779296875, 156.47999572753906, 557.760009765625, 392.0], [297.91998291015625, 252.47999572753906, 458.55999755859375, 550.0800170898438]], "labels": ["8 of spades", "6 of spade", "7 of spoons"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.7200012207031], [208.95999145507812, 285.1199951171875, 345.91998291015625, 463.67999267578125]], "labels": ["queen of spades", "7 of hearts"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 565.4400024414062], [310.7200012207031, 424.6399841308594, 548.1599731445312, 562.8800048828125]], "labels": ["2 of spades", "5 of spade", "6 of spoons"]}}
Training Epoch 6/10: 100%|██████████| 136/136 [01:56<00:00, 1.17it/s]
Average Training Loss: 3.2458674925215103
Validation Epoch 6/10: 100%|██████████| 8/8 [00:02<00:00, 2.75it/s]
Average Validation Loss: 2.800899028778076
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.31999969482422, 512.3200073242188, 357.44000244140625], [162.87998962402344, 330.55999755859375, 301.1199951171875, 585.2799682617188], [53.439998626708984, 239.0399932861328, 166.72000122070312, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "9 of hearts"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 268.47998046875, 412.47998046875], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [330.55999755859375, 42.55999755859375, 517.4400024414062, 207.67999267578125]], "labels": ["6 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.27999877929688, 272.32000732421875, 396.47998046875, 511.67999267578125], [367.67999267578125, 234.55999755859375, 518.0800170898438, 491.1999816894531]], "labels": ["queen of spades", "9 of clubs"]}}
{"<OD>": {"bboxes": [[437.44000244140625, 157.75999450683594, 556.47998046875, 391.3599853515625], [298.55999755859375, 254.39999389648438, 457.2799987792969, 549.4400024414062]], "labels": ["6 of spades", "7 of spade"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.0799865722656], [208.95999145507812, 285.1199951171875, 345.91998291015625, 461.7599792480469]], "labels": ["queen of spades", "7 of hearts"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 564.7999877929688], [310.7200012207031, 424.0, 548.7999877929688, 562.239990234375]], "labels": ["2 of spades", "5 of spade", "10 of sp clubs"]}}
Training Epoch 7/10: 100%|██████████| 136/136 [01:32<00:00, 1.46it/s]
Average Training Loss: 3.0911339188323304
Validation Epoch 7/10: 100%|██████████| 8/8 [00:02<00:00, 2.77it/s]
Average Validation Loss: 2.675829589366913
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.95999908447266, 512.3200073242188, 357.44000244140625], [162.87998962402344, 330.55999755859375, 301.1199951171875, 585.2799682617188], [53.439998626708984, 239.0399932861328, 166.72000122070312, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "9 of spoons"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 267.1999816894531, 411.8399963378906], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [333.1199951171875, 43.20000076293945, 516.7999877929688, 207.67999267578125]], "labels": ["6 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.27999877929688, 272.32000732421875, 396.47998046875, 511.67999267578125], [367.03997802734375, 234.55999755859375, 518.0800170898438, 491.1999816894531]], "labels": ["queen of spades", "9 of spade"]}}
{"<OD>": {"bboxes": [[437.44000244140625, 157.75999450683594, 556.47998046875, 391.3599853515625], [298.55999755859375, 254.39999389648438, 457.2799987792969, 549.4400024414062]], "labels": ["6 of spades", "7 of spade"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.0799865722656], [208.95999145507812, 285.7599792480469, 345.91998291015625, 462.3999938964844]], "labels": ["queen of spades", "7 of hearts"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 564.7999877929688], [310.7200012207031, 424.6399841308594, 548.1599731445312, 562.239990234375]], "labels": ["2 of spades", "5 of spade", "9 of sp clubs"]}}
Training Epoch 8/10: 100%|██████████| 136/136 [01:33<00:00, 1.46it/s]
Average Training Loss: 2.9872096601654503
Validation Epoch 8/10: 100%|██████████| 8/8 [00:02<00:00, 2.73it/s]
Average Validation Loss: 2.5895615220069885
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.31999969482422, 512.3200073242188, 357.44000244140625], [162.87998962402344, 330.55999755859375, 301.1199951171875, 585.2799682617188], [53.439998626708984, 239.0399932861328, 167.36000061035156, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "9 of spoons"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 127.68000030517578, 268.47998046875, 412.47998046875], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [330.55999755859375, 42.55999755859375, 517.4400024414062, 207.67999267578125]], "labels": ["9 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 273.6000061035156, 396.47998046875, 511.67999267578125], [368.3199768066406, 235.1999969482422, 517.4400024414062, 491.1999816894531], [19.520000457763672, 289.6000061035156, 223.67999267578125, 582.0800170898438], [86.72000122070312, 164.16000366210938, 255.67999267578125, 403.5199890136719]], "labels": ["queen of spades", "9 of spade", "10 of sp clubs", "king of clubs"]}}
{"<OD>": {"bboxes": [[437.44000244140625, 157.75999450683594, 556.47998046875, 391.3599853515625], [298.55999755859375, 254.39999389648438, 457.2799987792969, 549.4400024414062]], "labels": ["6 of spades", "7 of spade"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.0799865722656], [208.95999145507812, 285.7599792480469, 345.91998291015625, 461.7599792480469]], "labels": ["queen of spades", "7 of hearts"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 564.1599731445312], [310.7200012207031, 424.0, 548.7999877929688, 562.239990234375]], "labels": ["2 of spades", "5 of spade", "9 of sp clubs"]}}
Training Epoch 9/10: 100%|██████████| 136/136 [01:33<00:00, 1.46it/s]
Average Training Loss: 2.8879061095854817
Validation Epoch 9/10: 100%|██████████| 8/8 [00:03<00:00, 2.59it/s]
Average Validation Loss: 2.5414960384368896
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.31999969482422, 512.3200073242188, 357.44000244140625], [162.87998962402344, 330.55999755859375, 301.1199951171875, 585.2799682617188], [53.439998626708984, 239.0399932861328, 167.36000061035156, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "9 of spoons"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 267.1999816894531, 411.8399963378906], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [333.1199951171875, 43.20000076293945, 516.7999877929688, 207.67999267578125]], "labels": ["9 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 273.6000061035156, 396.47998046875, 511.67999267578125], [368.3199768066406, 235.1999969482422, 517.4400024414062, 491.1999816894531], [19.520000457763672, 289.6000061035156, 223.67999267578125, 582.0800170898438], [86.72000122070312, 164.16000366210938, 255.67999267578125, 403.5199890136719]], "labels": ["queen of spades", "9 of spade", "10 of sp clubs", "king of spates"]}}
{"<OD>": {"bboxes": [[437.44000244140625, 157.75999450683594, 556.47998046875, 391.3599853515625], [298.55999755859375, 254.39999389648438, 457.2799987792969, 550.0800170898438]], "labels": ["6 of spades", "7 of spade"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.0799865722656], [208.95999145507812, 285.7599792480469, 345.91998291015625, 462.3999938964844], [328.6399841308594, 192.3199920654297, 467.5199890136719, 398.3999938964844]], "labels": ["queen of spades", "7 of hearts", "6 of spade"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 564.1599731445312], [310.7200012207031, 424.6399841308594, 548.7999877929688, 562.239990234375]], "labels": ["2 of spades", "5 of spade", "9 of sp clubs"]}}
Training Epoch 10/10: 100%|██████████| 136/136 [01:32<00:00, 1.47it/s]
Average Training Loss: 2.8460742568268493
Validation Epoch 10/10: 100%|██████████| 8/8 [00:02<00:00, 2.74it/s]
Average Validation Loss: 2.526287943124771
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
BoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.
{"<OD>": {"bboxes": [[372.79998779296875, 112.31999969482422, 512.3200073242188, 357.44000244140625], [162.87998962402344, 330.55999755859375, 301.1199951171875, 585.2799682617188], [53.439998626708984, 239.0399932861328, 167.36000061035156, 469.44000244140625]], "labels": ["queen of spades", "king of spade", "9 of spoons"]}}
{"<OD>": {"bboxes": [[0.3199999928474426, 128.3199920654297, 267.1999816894531, 411.8399963378906], [198.72000122070312, 82.23999786376953, 381.1199951171875, 324.1600036621094], [333.1199951171875, 43.20000076293945, 516.7999877929688, 207.67999267578125]], "labels": ["9 of clubs", "7 of clubs", "5 of clubs"]}}
{"<OD>": {"bboxes": [[185.9199981689453, 273.6000061035156, 396.47998046875, 511.67999267578125], [368.3199768066406, 235.1999969482422, 517.4400024414062, 491.1999816894531], [19.520000457763672, 289.6000061035156, 223.67999267578125, 582.0800170898438], [86.72000122070312, 164.16000366210938, 255.67999267578125, 403.5199890136719]], "labels": ["queen of spades", "9 of spade", "10 of sp clubs", "king of spates"]}}
{"<OD>": {"bboxes": [[437.44000244140625, 157.75999450683594, 556.47998046875, 391.3599853515625], [298.55999755859375, 254.39999389648438, 457.2799987792969, 550.0800170898438]], "labels": ["6 of spades", "7 of spade"]}}
{"<OD>": {"bboxes": [[463.67999267578125, 221.1199951171875, 636.47998046875, 406.0799865722656], [208.95999145507812, 285.7599792480469, 345.91998291015625, 462.3999938964844], [328.6399841308594, 192.3199920654297, 467.5199890136719, 398.3999938964844]], "labels": ["queen of spades", "7 of hearts", "6 of spade"]}}
{"<OD>": {"bboxes": [[10.559999465942383, 228.1599884033203, 275.5199890136719, 428.47998046875], [98.23999786376953, 433.5999755859375, 314.55999755859375, 564.1599731445312], [310.7200012207031, 424.6399841308594, 548.7999877929688, 562.239990234375]], "labels": ["2 of spades", "5 of spade", "9 of sp clubs"]}}
CPU times: user 19min 11s, sys: 4min 23s, total: 23min 34s
Wall time: 22min 28s
Fine-tuned model evaluation
# @title Check if the model can still detect objects outside of the custom dataset
image = Image.open(EXAMPLE_IMAGE_PATH)
task = "<OD>"
text = "<OD>"
inputs = processor(text=text, images=image, return_tensors="pt").to(DEVICE)
generated_ids = peft_model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
response = processor.post_process_generation(generated_text, task=task, image_size=(image.width, image.height))
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
bounding_box_annotator = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)
image = bounding_box_annotator.annotate(image, detections)
image = label_annotator.annotate(image, detections)
image.thumbnail((600, 600))
imageBoundingBoxAnnotator is deprecated: `BoundingBoxAnnotator` is deprecated and has been renamed to `BoxAnnotator`. `BoundingBoxAnnotator` will be removed in supervision-0.26.0.

NOTE: It seems that the model can still detect classes that don’t belong to our custom dataset.
# @title Collect predictions
PATTERN = r'([a-zA-Z0-9 ]+ of [a-zA-Z0-9 ]+)<loc_\d+>'
def extract_classes(dataset: DetectionDataset):
class_set = set()
for i in range(len(dataset.dataset)):
image, data = dataset.dataset[i]
suffix = data["suffix"]
classes = re.findall(PATTERN, suffix)
class_set.update(classes)
return sorted(class_set)
CLASSES = extract_classes(train_dataset)
targets = []
predictions = []
for i in range(len(val_dataset.dataset)):
image, data = val_dataset.dataset[i]
prefix = data['prefix']
suffix = data['suffix']
inputs = processor(text=prefix, images=image, return_tensors="pt").to(DEVICE)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
prediction = processor.post_process_generation(generated_text, task='<OD>', image_size=image.size)
prediction = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, prediction, resolution_wh=image.size)
prediction = prediction[np.isin(prediction['class_name'], CLASSES)]
prediction.class_id = np.array([CLASSES.index(class_name) for class_name in prediction['class_name']])
prediction.confidence = np.ones(len(prediction))
target = processor.post_process_generation(suffix, task='<OD>', image_size=image.size)
target = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, target, resolution_wh=image.size)
target.class_id = np.array([CLASSES.index(class_name) for class_name in target['class_name']])
targets.append(target)
predictions.append(prediction)# @title Calculate mAP
# mean_average_precision = sv.MeanAveragePrecision.from_detections(
# predictions=predictions,
# targets=targets,
# )
mean_average_precision = sv.metrics.MeanAveragePrecision().update(predictions, targets).compute()
print(f"map50_95: {mean_average_precision.map50_95:.2f}")
print(f"map50: {mean_average_precision.map50:.2f}")
print(f"map75: {mean_average_precision.map75:.2f}")map50_95: 0.18
map50: 0.20
map75: 0.20
p = sv.metrics.Precision()
p = p.update(predictions, targets).compute()
print(p.precision_at_50)
r = sv.metrics.Recall()
r = r.update(predictions, targets).compute()
print(r.recall_at_50)0.2074450084602369
0.14213197969543148
invalid value encountered in divide
# @title Calculate Confusion Matrix
confusion_matrix = sv.ConfusionMatrix.from_detections(
predictions=predictions,
targets=targets,
classes=CLASSES
)
_ = confusion_matrix.plot()
Save fine-tuned model on hard drive
peft_model.save_pretrained("/content/florence2-lora")
processor.save_pretrained("/content/florence2-lora/")
!ls -la /content/florence2-lora/--------------------------------------------------------------------------- PermissionError Traceback (most recent call last) Cell In[25], line 1 ----> 1 peft_model.save_pretrained("/content/florence2-lora") 2 processor.save_pretrained("/content/florence2-lora/") 3 get_ipython().system('ls -la /content/florence2-lora/') File /opt/anaconda3/envs/zeel_py310/lib/python3.10/site-packages/peft/peft_model.py:320, in PeftModel.save_pretrained(self, save_directory, safe_serialization, selected_adapters, save_embedding_layers, is_main_process, path_initial_model_for_weight_conversion, **kwargs) 317 return output_state_dict 319 if is_main_process: --> 320 os.makedirs(save_directory, exist_ok=True) 321 self.create_or_update_model_card(save_directory) 323 for adapter_name in selected_adapters: File /opt/anaconda3/envs/zeel_py310/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok) 213 if head and tail and not path.exists(head): 214 try: --> 215 makedirs(head, exist_ok=exist_ok) 216 except FileExistsError: 217 # Defeats race condition when another thread created the path 218 pass File /opt/anaconda3/envs/zeel_py310/lib/python3.10/os.py:225, in makedirs(name, mode, exist_ok) 223 return 224 try: --> 225 mkdir(name, mode) 226 except OSError: 227 # Cannot rely on checking for EEXIST, since the operating system 228 # could give priority to other errors like EACCES or EROFS 229 if not exist_ok or not path.isdir(name): PermissionError: [Errno 13] Permission denied: '/content'
Upload model to Roboflow (optional)
You can deploy your Florence-2 object detection model on your own hardware (i.e. a cloud GPu server or an NVIDIA Jetson) with Roboflow Inference, an open source computer vision inference server.
To deploy your model, you will need a free Roboflow account.
To get started, create a new Project in Roboflow if you don’t already have one. Then, upload the dataset you used to train your model. Then, create a dataset Version, which is a snapshot of your dataset with which your model will be associated in Roboflow.
You can read our full Deploy Florence-2 with Roboflow guide for step-by-step instructions of these steps.
Once you have trained your model A, you can upload it to Roboflow using the following code:
import roboflow
rf = Roboflow(api_key="API_KEY")
project = rf.workspace("workspace-id").project("project-id")
version = project.version(VERSION)
version.deploy(model_type="florence-2", model_path="/content/florence2-lora")Above, replace:
- API_KEY with your Roboflow API key.
- workspace-id and project-id with your workspace and project IDs.
- VERSION with your project version.
If you are not using our notebook, replace /content/florence2-lora with the directory where you saved your model weights.
When you run the code above, the model will be uploaded to Roboflow. It will take a few minutes for the model to be processed before it is ready for use.
Your model will be uploaded to Roboflow.
Deploy to your hardware
Once your model has been processed, you can download it to any device on which you want to deploy your model. Deployment is supported through Roboflow Inference, our open source computer vision inference server.
Inference can be run as a microservice with Docker, ideal for large deployments where you may need a centralized server on which to run inference, or when you want to run Inference in an isolated container. You can also directly integrate Inference into your project through the Inference Python SDK.
For this guide, we will show how to deploy the model with the Python SDK.
First, install inference:
!pip install inferenceThen, create a new Python file and add the following code:
import os
from inference import get_model
from PIL import Image
import json
lora_model = get_model("model-id/version-id", api_key="KEY")
image = Image.open("containers.png")
response = lora_model.infer(image)
print(response)In the code avove, we load our model, run it on an image, then plot the predictions with the supervision Python package.
When you first run the code, your model weights will be downloaded and cached to your device for subsequent runs. This process may take a few minutes depending on the strength of your internet connection.
Congratulations
⭐️ If you enjoyed this notebook, star the Roboflow Notebooks repo (and supervision while you’re at it) and let us know what tutorials you’d like to see us do next. ⭐️
